Compiler Generated Multithreading to Alleviate Memory Latency
نویسندگان
چکیده
Since the era of vector and pipelined computing, the computational speed is limited by the memory access time. Faster caches and more cache levels are used to bridge the growing gap between the memory and processor speeds. With the advent of multithreaded processors, it becomes feasible to concurrently fetch data and compute in two cooperating threads. A technique is presented to generate these threads at compile time, taking into account the characteristics of both the program and the underlying architecture. The results have been evaluated for an explicitly parallel processor. With a number of common programs the data-fetch thread allows to continue the computation without cache miss stalls.
منابع مشابه
Compiling irregular applications for reconfigurable systems
Algorithms that exhibit irregular memory access patterns are known to show poor performance on multiprocessor architectures, particularly when memory access latency is variable. Many common data structures, including graphs, trees, and linked-lists, exhibit these irregular memory access patterns. While FPGA-based code accelerators have been successful on applications with regular memory access ...
متن کاملA Multithreaded Runtime System With Thread Migration for Distributed Memory Parallel Computing
Multithreading is very effective at tolerating the latency of remote memory accesses in distributed memory parallel computers, but does nothing to reduce the number or cost of those memory accesses. Compiler techniques and runtime approaches, such as caching remote memory accesses and prefetching, are often used to reduce the number of remote memory accesses. Another approach to reduce the numb...
متن کاملLatency Tolerance through Multithreading in Large-Scale Multiprocessors
In large-scale distributed-memory multiprocessors, remote memory accesses su er signi cant latencies. Caches help alleviate the memory latency problem by maintaining local copies of frequently used data. However, they cannot eliminate the latency caused by rst-time references and invalidations needed to enforce cache coherence. Multithreaded processors tolerate such latencies by rapidly switchi...
متن کاملCompiler-Controlled Multithreading for Lenient Parallel Languages1
Abstract: Tolerance to communication latency and inexpensive synchronization are critical for general-purpose computing on large multiprocessors. Fast dynamic scheduling is required for powerful non-strict parallel languages. However, machines that support rapid switching between multiple execution threads remain a design challenge. This paper explores how multithreaded execution can be address...
متن کاملAn Evaluation of Thread Migration for Exploiting Distributed Array Locality
Thread migration is one approach to remote memory accesses on distributed memory parallel computers. In thread migration, threads of control migrate between processors to access data local to those processors, while conventional approaches tend to move data to the threads that need them. Migration approaches enhance spatial locality by making large address spaces local, but are less adept at ex...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- J. UCS
دوره 6 شماره
صفحات -
تاریخ انتشار 2000